Homework 2

homework 2

Homework 2 and wip Final Project

Author

Julian Castoro

Published

December 23, 2022

Code

library(tidyverse)
library(lubridate)
library(ggplot2)
library(readxl)
library(dplyr)
library(purrr)
library(lubridate)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Introduction

Data choice overview

I initially chose this data set because recent events abroad have certainly escalated a global fear of war and I wanted to see how the tides of military spending ebbed and flowed over the course of what we have been tracking. As I cleaned the data more and more questions bled from the raw information infront of me. I am excited to see what sort of trends come from analyzing this data set.

I plan to show how different countries value military spending and how that value changes over the course of time. Ideally I will be able to explain patterns in what I see with global or locally important historic events. How did the US spending change after 9/11 or more recently did we see anyone bolstering their defenses before news broke of the Russian invasion of Ukraine? How did spending change globally after the first and then second world war? While I am not a statistical expert and will not be able to refute a causation vs correlation argument, visualizations of these events and the corresponding spending patterns will still be interesting and hopefully provoking of conversation.

Data

Briefly describe the data

Below is a list of the available sheets in the SIPRI military spending data export. A few of the tabs are the same base information altered to reflect a specific currency or ratio. I am choosing to use the “Current USD” as my source of raw spending numbers as I believe it to be the easiest to understand and relate to. I will also be using “share of GDP” as well as “Share of Govt. spending” to provide context around a nations spending compared to their population they intend on defending as well as compared to overall spending.

Code

sheets <- excel_sheets("_data/SIPRI-Milex-data-1949-2021.xlsx")
sheets

 [1] "Front page"                     "Regional totals"               
 [3] "Local currency financial years" "Local currency calendar years" 
 [5] "Constant (2020) USD"            "Current USD"                   
 [7] "Share of GDP"                   "Per capita"                    
 [9] "Share of Govt. spending"        "Footnotes"

Narative and variables in the dataset

The data provides a narrative around the military spending for all countries where the information was accessible at the time. The variables in each sheet are different however they are all reflective of spending on military budget for those countries in different forms i.e. USD, SHare of GDP, per capita, etc.

Here is a peek at what the raw information looks like.

Code

rawrawData_ShareOfGovSpend <- read_excel("_data/SIPRI-Milex-data-1949-2021.xlsx",sheet=sheets[9])
head(rawrawData_ShareOfGovSpend)

# A tibble: 6 × 37
  Military e…¹ ...2  ...3  ...4  ...5  ...6  ...7  ...8  ...9  ...10 ...11 ...12
  <chr>        <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 "Countries … <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
2 "Figures ar… <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
3 "Data for g… <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
4 "Figures in… <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
5 "\". .\" = … <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
6  <NA>        <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
# … with 25 more variables: ...13 <chr>, ...14 <chr>, ...15 <chr>, ...16 <chr>,
#   ...17 <chr>, ...18 <chr>, ...19 <chr>, ...20 <chr>, ...21 <chr>,
#   ...22 <chr>, ...23 <chr>, ...24 <chr>, ...25 <chr>, ...26 <chr>,
#   ...27 <chr>, ...28 <chr>, ...29 <chr>, ...30 <chr>, ...31 <chr>,
#   ...32 <chr>, ...33 <chr>, ...34 <chr>, ...35 <chr>, ...36 <chr>,
#   ...37 <chr>, and abbreviated variable name
#   ¹`Military expenditure by country as percentage of government spending, 1949-2021         © SIPRI 2021`

Reading in Raw information

Reading in raw information

The data provided by the Stockholm International Peace Research Institute(sipri) was separated into multiple tabs in one excel .xlsx file.

In order to begin working I imported the tabs I planned to utilize, skipping over some of the notes and title rows at the start of each tab.

Code

rawData_CurrentUSD <- read_excel("_data/SIPRI-Milex-data-1949-2021.xlsx",sheet=sheets[6],skip=5)
rawData_ShareOfGDP <- read_excel("_data/SIPRI-Milex-data-1949-2021.xlsx",sheet=sheets[7],skip=5)
rawData_ShareOfGovSpend <- read_excel("_data/SIPRI-Milex-data-1949-2021.xlsx",sheet=sheets[9],skip=7)



head(rawData_CurrentUSD)

# A tibble: 6 × 75
  Country   Notes `1949` `1950` `1951` `1952` `1953` `1954` `1955` `1956` `1957`
  <chr>     <chr> <chr>  <chr>  <chr>  <chr>  <chr>  <chr>  <chr>  <chr>  <chr> 
1 <NA>      <NA>  <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>  
2 Africa    <NA>  <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>  
3 North Af… <NA>  <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>  
4 Algeria   §4    xxx    xxx    xxx    xxx    xxx    xxx    xxx    xxx    xxx   
5 Libya     ‡§¶16 xxx    xxx    ...    ...    ...    ...    ...    ...    ...   
6 Morocco   §17   xxx    xxx    xxx    xxx    xxx    xxx    xxx    23.71… 35.40…
# … with 64 more variables: `1958` <chr>, `1959` <chr>, `1960` <chr>,
#   `1961` <chr>, `1962` <chr>, `1963` <chr>, `1964` <chr>, `1965` <chr>,
#   `1966` <chr>, `1967` <chr>, `1968` <chr>, `1969` <chr>, `1970` <chr>,
#   `1971` <chr>, `1972` <chr>, `1973` <chr>, `1974` <chr>, `1975` <chr>,
#   `1976` <chr>, `1977` <chr>, `1978` <chr>, `1979` <chr>, `1980` <chr>,
#   `1981` <chr>, `1982` <chr>, `1983` <chr>, `1984` <chr>, `1985` <chr>,
#   `1986` <chr>, `1987` <chr>, `1988` <chr>, `1989` <chr>, `1990` <chr>, …

Code

#head(rawData_ShareOfGDP)
#head(rawData_ShareOfGovSpend)

Next I delete the first row of NA as well as the Notes column for each tibble.

Code

## trying Purr here to be cleaner, we have not covered this yet so please let me know if this could be better.

#tibleList <- lst(rawData_CurrentUSD, rawData_ShareOfGDP, rawData_ShareOfGovSpend)

#modify(tibleList,select(-1))
#map(tibleList,slice(-1))

## not working ^^




rawData_CurrentUSD <- rawData_CurrentUSD[-1,]  %>%
  select(-Notes)


rawData_ShareOfGDP <- rawData_ShareOfGDP[-1,] %>%
  select(-Notes)


rawData_ShareOfGovSpend <- rawData_ShareOfGovSpend[-1,] %>%
  select(-2:-3)


head(rawData_CurrentUSD)

# A tibble: 6 × 74
  Country  `1949` `1950` `1951` `1952` `1953` `1954` `1955` `1956` `1957` `1958`
  <chr>    <chr>  <chr>  <chr>  <chr>  <chr>  <chr>  <chr>  <chr>  <chr>  <chr> 
1 Africa   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>  
2 North A… <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>  
3 Algeria  xxx    xxx    xxx    xxx    xxx    xxx    xxx    xxx    xxx    xxx   
4 Libya    xxx    xxx    ...    ...    ...    ...    ...    ...    ...    ...   
5 Morocco  xxx    xxx    xxx    xxx    xxx    xxx    xxx    23.71… 35.40… 41.69…
6 Tunisia  xxx    xxx    xxx    xxx    xxx    xxx    xxx    3.714… 6.411… 9.523…
# … with 63 more variables: `1959` <chr>, `1960` <chr>, `1961` <chr>,
#   `1962` <chr>, `1963` <chr>, `1964` <chr>, `1965` <chr>, `1966` <chr>,
#   `1967` <chr>, `1968` <chr>, `1969` <chr>, `1970` <chr>, `1971` <chr>,
#   `1972` <chr>, `1973` <chr>, `1974` <chr>, `1975` <chr>, `1976` <chr>,
#   `1977` <chr>, `1978` <chr>, `1979` <chr>, `1980` <chr>, `1981` <chr>,
#   `1982` <chr>, `1983` <chr>, `1984` <chr>, `1985` <chr>, `1986` <chr>,
#   `1987` <chr>, `1988` <chr>, `1989` <chr>, `1990` <chr>, `1991` <chr>, …

Code

#head(rawData_ShareOfGDP)
#head(rawData_ShareOfGovSpend)

Tidying Data

Before pivoting my data I wanted to add a column for region. I chose to do this with an algorithm as an exercise in iteration however it would have been more practical to just hard code this column.

Before Pivoting I am adding a column for region

The list of countries where there are NA’s for every year:

Code

Regions <- rawData_CurrentUSD %>%
  filter(is.na(`1949`))%>%
  select(Country)

Regions

# A tibble: 18 × 1
   Country                          
   <chr>                            
 1 Africa                           
 2 North Africa                     
 3 sub-Saharan Africa               
 4 Americas                         
 5 Central America and the Caribbean
 6 North America                    
 7 South America                    
 8 Asia & Oceania                   
 9 Oceania                          
10 South Asia                       
11 East Asia                        
12 South East Asia                  
13 Central Asia                     
14 Europe                           
15 Central Europe                   
16 Eastern Europe                   
17 Western Europe                   
18 Middle East

You will notice that some of these are continents and sub-continents, I wanted to break these into a Region field. The pattern I saw and chose to exploit was the fact that the overarching category would always be followed with a more specific category such as the Africa followed by North Africa values (see earlier print outs to see raw information).

I then added an empty vector into the tibble and populated it according to the Country column using the pattern described above.

Once I had left flags in the region column dictating where each region was starting I could use fill(Region) in order to populate the rest of the column.

Code

emptyRegionCol <- as.numeric(vector(mode = "character",length = length(rawData_CurrentUSD$`1949`)))## doing as numeric here as a hacky way to fill this with NA's, any better way?


#adding col to tibble to be populated later
mutated_CurrentUSD <- rawData_CurrentUSD %>%
  mutate(Region =emptyRegionCol,.before=2)

#mutated_CurrentUSD
#head(mutated_CurrentUSD)



# replacing the empty char in region with actual region if 2 NAs appear in a row
  #iterates along the indices of the Country vector
for(i in seq_along(mutated_CurrentUSD$Country) ){
  
   #if 2 nas in a row then do something
  if(is.na(mutated_CurrentUSD$`1949`[i]) && is.na(mutated_CurrentUSD$`1949`[i+1]) ){
    mutated_CurrentUSD$Region[i+1] <- mutated_CurrentUSD$Country[i+1] #this works because when referencing an index outside of the vector R will return NA - Great to know.
  }
  
  
  #if the row is a region
  if(is.na(mutated_CurrentUSD$`1949`[i]) ){
     mutated_CurrentUSD$Region[i+1] <- mutated_CurrentUSD$Country[i]
  }

  
}

mutated_CurrentUSD <- mutated_CurrentUSD %>%
  fill(Region)

mutated_CurrentUSD

# A tibble: 191 × 75
   Country Region `1949` `1950` `1951` `1952` `1953` `1954` `1955` `1956` `1957`
   <chr>   <chr>  <chr>  <chr>  <chr>  <chr>  <chr>  <chr>  <chr>  <chr>  <chr> 
 1 Africa  <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>  
 2 North … Africa <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>  
 3 Algeria North… xxx    xxx    xxx    xxx    xxx    xxx    xxx    xxx    xxx   
 4 Libya   North… xxx    xxx    ...    ...    ...    ...    ...    ...    ...   
 5 Morocco North… xxx    xxx    xxx    xxx    xxx    xxx    xxx    23.71… 35.40…
 6 Tunisia North… xxx    xxx    xxx    xxx    xxx    xxx    xxx    3.714… 6.411…
 7 sub-Sa… North… <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>  
 8 Angola  sub-S… xxx    xxx    xxx    xxx    xxx    xxx    xxx    xxx    xxx   
 9 Benin   sub-S… xxx    xxx    xxx    xxx    xxx    xxx    xxx    xxx    xxx   
10 Botswa… sub-S… xxx    xxx    xxx    xxx    xxx    xxx    xxx    xxx    xxx   
# … with 181 more rows, and 64 more variables: `1958` <chr>, `1959` <chr>,
#   `1960` <chr>, `1961` <chr>, `1962` <chr>, `1963` <chr>, `1964` <chr>,
#   `1965` <chr>, `1966` <chr>, `1967` <chr>, `1968` <chr>, `1969` <chr>,
#   `1970` <chr>, `1971` <chr>, `1972` <chr>, `1973` <chr>, `1974` <chr>,
#   `1975` <chr>, `1976` <chr>, `1977` <chr>, `1978` <chr>, `1979` <chr>,
#   `1980` <chr>, `1981` <chr>, `1982` <chr>, `1983` <chr>, `1984` <chr>,
#   `1985` <chr>, `1986` <chr>, `1987` <chr>, `1988` <chr>, `1989` <chr>, …

With my tibble in this state I can now remove the header rows.

Code

mutated_CurrentUSD <- mutated_CurrentUSD %>%
  filter(!is.na(`1949`))
mutated_CurrentUSD

# A tibble: 173 × 75
   Country Region `1949` `1950` `1951` `1952` `1953` `1954` `1955` `1956` `1957`
   <chr>   <chr>  <chr>  <chr>  <chr>  <chr>  <chr>  <chr>  <chr>  <chr>  <chr> 
 1 Algeria North… xxx    xxx    xxx    xxx    xxx    xxx    xxx    xxx    xxx   
 2 Libya   North… xxx    xxx    ...    ...    ...    ...    ...    ...    ...   
 3 Morocco North… xxx    xxx    xxx    xxx    xxx    xxx    xxx    23.71… 35.40…
 4 Tunisia North… xxx    xxx    xxx    xxx    xxx    xxx    xxx    3.714… 6.411…
 5 Angola  sub-S… xxx    xxx    xxx    xxx    xxx    xxx    xxx    xxx    xxx   
 6 Benin   sub-S… xxx    xxx    xxx    xxx    xxx    xxx    xxx    xxx    xxx   
 7 Botswa… sub-S… xxx    xxx    xxx    xxx    xxx    xxx    xxx    xxx    xxx   
 8 Burkin… sub-S… xxx    xxx    xxx    xxx    xxx    xxx    xxx    xxx    xxx   
 9 Burundi sub-S… xxx    xxx    xxx    xxx    xxx    xxx    xxx    xxx    xxx   
10 Camero… sub-S… xxx    xxx    xxx    xxx    xxx    xxx    xxx    xxx    xxx   
# … with 163 more rows, and 64 more variables: `1958` <chr>, `1959` <chr>,
#   `1960` <chr>, `1961` <chr>, `1962` <chr>, `1963` <chr>, `1964` <chr>,
#   `1965` <chr>, `1966` <chr>, `1967` <chr>, `1968` <chr>, `1969` <chr>,
#   `1970` <chr>, `1971` <chr>, `1972` <chr>, `1973` <chr>, `1974` <chr>,
#   `1975` <chr>, `1976` <chr>, `1977` <chr>, `1978` <chr>, `1979` <chr>,
#   `1980` <chr>, `1981` <chr>, `1982` <chr>, `1983` <chr>, `1984` <chr>,
#   `1985` <chr>, `1986` <chr>, `1987` <chr>, `1988` <chr>, `1989` <chr>, …

Pivoting years

Next I pivot the years

Before:

Code

#cols to pivot
head(mutated_CurrentUSD)

# A tibble: 6 × 75
  Country Region  `1949` `1950` `1951` `1952` `1953` `1954` `1955` `1956` `1957`
  <chr>   <chr>   <chr>  <chr>  <chr>  <chr>  <chr>  <chr>  <chr>  <chr>  <chr> 
1 Algeria North … xxx    xxx    xxx    xxx    xxx    xxx    xxx    xxx    xxx   
2 Libya   North … xxx    xxx    ...    ...    ...    ...    ...    ...    ...   
3 Morocco North … xxx    xxx    xxx    xxx    xxx    xxx    xxx    23.71… 35.40…
4 Tunisia North … xxx    xxx    xxx    xxx    xxx    xxx    xxx    3.714… 6.411…
5 Angola  sub-Sa… xxx    xxx    xxx    xxx    xxx    xxx    xxx    xxx    xxx   
6 Benin   sub-Sa… xxx    xxx    xxx    xxx    xxx    xxx    xxx    xxx    xxx   
# … with 64 more variables: `1958` <chr>, `1959` <chr>, `1960` <chr>,
#   `1961` <chr>, `1962` <chr>, `1963` <chr>, `1964` <chr>, `1965` <chr>,
#   `1966` <chr>, `1967` <chr>, `1968` <chr>, `1969` <chr>, `1970` <chr>,
#   `1971` <chr>, `1972` <chr>, `1973` <chr>, `1974` <chr>, `1975` <chr>,
#   `1976` <chr>, `1977` <chr>, `1978` <chr>, `1979` <chr>, `1980` <chr>,
#   `1981` <chr>, `1982` <chr>, `1983` <chr>, `1984` <chr>, `1985` <chr>,
#   `1986` <chr>, `1987` <chr>, `1988` <chr>, `1989` <chr>, `1990` <chr>, …

After:

Code

clean_currentUSD <- mutated_CurrentUSD %>%
  pivot_longer(cols=3:ncol(mutated_CurrentUSD),names_to = "Year",values_drop_na = FALSE)
  
head(clean_currentUSD)

# A tibble: 6 × 4
  Country Region       Year  value
  <chr>   <chr>        <chr> <chr>
1 Algeria North Africa 1949  xxx  
2 Algeria North Africa 1950  xxx  
3 Algeria North Africa 1951  xxx  
4 Algeria North Africa 1952  xxx  
5 Algeria North Africa 1953  xxx  
6 Algeria North Africa 1954  xxx

Converting notations

Now I must handle the xxx and … notations. From the information page of my data set I can see that the notation is described as follows:

Raw Notation	Meaning
…	Data unavailable
xxx	Country did not exist or was not independent during all or part of the year in question

For now I think I will just keep both as NA but will save this form of the information for future use.

Code

cleanNA_currentUSD <-clean_currentUSD%>%
  na_if("...") %>%
  na_if("xxx")


head(cleanNA_currentUSD)

# A tibble: 6 × 4
  Country Region       Year  value
  <chr>   <chr>        <chr> <chr>
1 Algeria North Africa 1949  <NA> 
2 Algeria North Africa 1950  <NA> 
3 Algeria North Africa 1951  <NA> 
4 Algeria North Africa 1952  <NA> 
5 Algeria North Africa 1953  <NA> 
6 Algeria North Africa 1954  <NA>

Converting column types

Finally I will convert the column types to their correct representation.

Before:

Code

glimpse(cleanNA_currentUSD)

Rows: 12,629
Columns: 4
$ Country <chr> "Algeria", "Algeria", "Algeria", "Algeria", "Algeria", "Algeri…
$ Region  <chr> "North Africa", "North Africa", "North Africa", "North Africa"…
$ Year    <chr> "1949", "1950", "1951", "1952", "1953", "1954", "1955", "1956"…
$ value   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "66.43…

Code

# Year to date
cleanNA_currentUSD$Year<-as_date(cleanNA_currentUSD$Year, format = "%Y",tz=NULL)


#i could get just year by adding this but then it is a double?
#%>%
 # year()



# value to double
cleanNA_currentUSD$value <- as.numeric(cleanNA_currentUSD$value)

After:

Code

head(cleanNA_currentUSD)

# A tibble: 6 × 4
  Country Region       Year       value
  <chr>   <chr>        <date>     <dbl>
1 Algeria North Africa 1949-01-01    NA
2 Algeria North Africa 1950-01-01    NA
3 Algeria North Africa 1951-01-01    NA
4 Algeria North Africa 1952-01-01    NA
5 Algeria North Africa 1953-01-01    NA
6 Algeria North Africa 1954-01-01    NA

Potential research questions

Global events which may have led to increases of spending.
Grouped view of allies
Grouped view of regions
Which countries are steady vs decrease/increase
Could overlay US per capita spending with public opinion of military
Share of govt spending

Conclusion

“Every gun that is made, every warship launched, every rocket fired signifies in the final sense, a theft from those who hunger and are not fed, those who are cold and are not clothed. > > This world in arms is not spending money alone. It is spending the sweat of its laborers, the genius of its scientists, the hopes of its children. This is not a way of life at all in any > true sense. Under the clouds of war, it is humanity hanging on a cross of iron.”

Dwight D. Eisenhower

Citations: https://www.goodreads.com/quotes/tag/military-budget#:~:text=%E2%80%9CEvery%20gun%20that%20is%20made,is%20not%20spending%20money%20alone.

Appendix: Full descriptions as provided by SIPRI:

Introduction
Estimates of world, regional and sub-regional totals in constant (2019) US$ (billions).
Data for military expenditure by country in current price local currency, presented according to each country’s financial year.
Data for military expenditure by country in current price local currency, presented according to calendar year.
Data for military expenditure by country in constant price (2019) US$ (millions), presented according to calendar year, and in current US$m. for 2020.
Data for military expenditure by country in current US$ (millions), presented according to calendar year.
Data for military expenditure by country as a share of GDP, presented according to calendar year.
Data for military expenditure per capita, in current US$, presented according to calender year. (1988-2020 only)
Data for military expenditure as a percentage of general government expenditure. (1988-2020 only)